Super-Fast XML Wrapper Generation in DB2: A Demonstration
نویسندگان
چکیده
The XML Wrapper is a new feature of the federated database capabilities of DB2/UDB v8. It enables users and applications to issue SQL queries against XML data from a variety of sources, including files and web services. The XML Wrapper assumes hierarchical XML documents modeled as families of virtual relational tables in a federated schema, which can then be queried to extract information from the XML and combine it with data from other sources. Due to the nature of the problem, using the XML Wrapper is complex and several difficult steps must be undertaken: (i) The hierarchical schema of the source must be flattened to a relational form. (ii) Each relation of the flattened schema must be registered in DB2 as a NICKNAME – a complex virtual table definition containing several XPaths as specialized options. (iii) Each NICKNAME must be accompanied by a VIEW – again a complex structure involving join conditions. Chocolate is a tool that alleviates all three tasks: Chocolate provides several flattening strategies and an interface allowing users to modify the automatically generated target schema. Once the user is satisfied with the schema, Chocolate automatically generates the corresponding NICKNAME and VIEW definitions.
منابع مشابه
Data Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملXML and DB2
The eXtensible Markup Language (XML) is a key technology that facilitates both information exchange and e-business transactions. Starting with DB2 UDB Net.Data V1, an application can generate XML documents from SQL queries against DB2 or any ODBC compliant databases. Today DB2 UDB XML Extender not only serves as a repository for both XML documents and their Document Type Definitions (DTDs), but...
متن کاملSupporting unified interface to wrapper generator in Integrated Information Retrieval
Given the ever-increasing scale and diversity of information and applications on the Internet, improving the technology of information retrieval is an urgent research objective. Retrieved information is either semi-structured or unstructured in format and its sources are extremely heterogeneous. In consequence, the task of efficiently gathering and extracting information from documents can be b...
متن کاملSemantic Wrappers for Semi-Structured Data Extraction1
In this paper, we propose an approach to extract information from HTML pages and to add semantic (XML) tags to them. Wrapping is an essential technique used to automatically extract information from Web sources. This paper describes both, a general approach based on rules, which can be used to automatically generate wrappers, and an assistant generator wrapper called WebMantic. We also provide ...
متن کاملSemantic Wrappers for Semi-Structured Data Extraction
In this paper, we propose an approach to extract information from HTML pages and to add semantic (XML) tags to them. Wrapping is an essential technique used to automatically extract information from Web sources. This paper describes both, a general approach based on rules, which can be used to automatically generate wrappers, and an assistant generator wrapper called WebMantic. We also provide ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003